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13  ABSTRACT 

This  note  presents  a  tutorial  survey  of  the  mathematics  that  Is  used  in  the  study  of  linear  predictive 
filtering  as  applied  to  the  analysis  and  synthesis  of  speech.  Speech  is  modelled  as  the  output  of  an  all-pole 
filter  that  is  driven  by  either  a  periodic  pulse  train  c  white  noise.  A  minimum-mean-squared-error 
technique  for  estimating  the  coefficients  of  this  filter  from  speech  data  is  presented.  This  technique  leads 
to  a  set  of  equations  for  the  coefficient  estimates  which  can  he  solved  by  a  computationally  efficient 
recursive  technique  known  as  Levinson's  method. 

Tho  filter  derived  hv  the  above  mentioned  technique  can  be  realized  hy  any  standard  technique, 
however,  a  particulail,  in'-resting  realization  is  in  terms  of  a  digital  simulation  of  a  non-uniform 
acoustic  tube.  It  is  shown  that  any  stable  all-pole  filter  can  lie  realized  as  an  acoustic  tube  and, 
moreover,  that  tl  Levinson  recursion  produces  as  a  hy-product  exactly  the  reflection  coefficients 
needed  for  sui  h  a  realization. 

The  report  concludes  hy  hawing  how  the  classical  theory  of  orthogonal  polynomials  can  lie  applied 
to  the  speech  analvsis/synthesis  problem  and  used  to  derive  many  of  the  results  obtained  above  hy 
other  mean:;. 
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ABSTRACT 


This  note  presents  a  tutorial  survey  of  the  mathematics  that  is  used  in  the 
study  of  linear  predictive  filtering  as  applied  to  the  analysis  and  synthesis  of 
speech.  Speech  is  modelled  as  the  output  of  an  all-pole  filter  that  is  driven  b>' 
either  a  periodic  pulse  train  or  white  noise.  A  minimum -mean -squared-error 
technique  for  estimating  the  coefficients  of  this  filter  from  speech  data  is 
presented.  This  technique  leads  to  a  set  of  equations  for  the  coefficient 
estimates  which  can  be  solved  by  a  computationally  efficient  recursive  technique 
known  as  Levinson’s  method. 

The  filter  derived  by  the  above  mentioned  technique  can  be  realized  by 
any  standard  technique;  however,  a  particularly  interesting  realization  is  in 
terms  of  a  digital  simulation  of  a  non-uniform  acoustic  tube.  It  is  shown 
tha;  any  stable  all-pole  filter  can  be  realized  as  an  acoustic  tube  and,  moreover, 
that  the  Levinson  recursion  produces  as  a  by-product  exactly  the  reflection 
coefficients  needed  for  such  a  realization. 

The  report  concludes  by  showing  how  the  classical  theory  of  orthogonal 
polynomials  can  be  applied  to  the  speech  analysis /synthesis  problem  and  used 
to  derive  many  of  the  results  obtained  abov^e  by  oiher  means. 


Accepted  for  the  Air  Force 

Eugene  C.  Raabe,  Lt.  Col.,  L'SAF 

Chief,  ESD  Lincoln  Laboratory  Project  Office 


INTRODUCTION 


The  purpose  of  this  note  is  to  present  a  tutorial  discussion  of  the  mathematical 
theory  underlying  the  analysis  and  synthesis  of  speech  by  means  of  linear 
predictive  filtering.  None  of  the  results  presented  here  are  new,  all  having 
appeared  either  in  the  literature  or  in  research  reports.  The  main  reason 
for  the  present  note  is  to  present  these  scattered  results  from  a  unified  stand¬ 
point  and,  in  some  cases,  to  provide  more  detail  than  is  available  in  the 
literature. 

The  basic  speech  problem  under  consideration  can  be  formulated  as 
follows*  Samples  of  a  speech  waveform  are  modelled  as  being  the  output  of 
a  digital  filter  that  has  been  excited  by  either  a  series  of  equally  spaced  pulses 
or  white  noise  depending  whether  the  speech  is  voiced  or  unvoiced.  The  filter 
is  described  by  the  difference  equation 
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(1) 


t  h  th 

where  un  denotes  the  n  n  sample  of  the  excitation  and  s^  denotes  the  n  n  sample  of 

speech.  The  filter  order  p  is  assumed  to  be  known  on  the  basis  of  other 

considerations.  The  transfer  function  of  this  filter  is  easily  seen  to  be  [Hp(z)j  * 


where 

H  (z)  =1-1  a  z’k  (2) 

p  k=l 

from  which  it  is  apparent  that  [ Hp(z )J  1  is  an  all-pole  filter.  The  problem  at  hand 
is  to  use  samples  of  real  speech  to  arrive  at  an  estimate  of  the  filter  coefficients 
and  then  to  use  these  coefficients  to  synthesize  a  filter  that  could  be 
used  to  regenerate  the  original  speech.  The  latter  operation  requires  a  knowledge 
of  whether  the  original  speech  was  voiced  or  unvoiced  but  the  problem  of  how 
to  obtain  this  information  is  not  the  concern  of  the  present  work. 

There  are  many  ways  one  could  go  about  estimating  the  filter  coefficients 
from  the  speech  samples.  The  particular  method  that  will  be  considered  in  this 

*  This  section  Is  based  on  references  1,  2,  6,  8,  9. 


note  is  a  minimum-mean-squared  error  technique  that  now  will  be 
described. 


Select  a  group  of  N+l  speech  samples  which,  for  convenience,  will 


be  numbered  from  n  =  0  to  n  =  N.  Define  a  sequence  sn  by 


s  = 

n 


speech  sample 
0 

and  define  the  mean-squared  prediction  error  by 
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Ogn  sN 
n  <0,  n  >N 


(3) 


e  =  2 

n=-« 


°°  f  p  , 

2  s  -  2  as  . 

=-Ln  k--i  k  n'k 


(4) 


The  quantity  e  is  a  function  of  the  assumed  values  for  the  ak’s.  The 
desired  estimate  for  the  ak’s  is  obtained  by  choosing  those  values  that  yield 
a  minimum  value  of  e. 

This  problem  can  be  solved  by  first  expanding  equation  (4)  as  follows*  : 

2  P  P 

e  =  £  s_  -  2  2  a.  2  s  s  .+2  a,  a.  2  s  ,  s 

n  n  k-1  K  n  n  n’k  k  j*lk  J  n  n~k  n~J 


P  P 

=  R„  -  2  2  a,R,  +  2  a,  a.  R,  . 

k,  i-1 


°  kVk  k  '  L 


(5) 


where  the  autocorrelation  function  Rk  is  defined  by 


R.  =  R  .  =  2  s  s  . 

k  -k  n  n  n-k 


(6) 


It  will  be  convenient  to  rewrite  equation  (5)  in  matrix  form  as  follows: 

(7) 


e  =  R0  -  2  aT  r  +  aT  R  a 


where 


-T  =  [  aj . — ap  | 

[i’|,r2.".rp| 


T 
r  * 


(8) 


All  sums  without  limits  will  henceforth  be  assumed  to  run  from  n  =  -®to  n  =  ■ 


2 


th 

and  the  correlation  matrix  R  has  as  its  (i,j)  element  Rj-j.  Note  that 
because  R  is  a  correlation  matrix,  it  is  positive  definite  and,  therefore, 
non-singular 

Completing  the  square  in  equation  (7)  yields  the  result, 


e  =  (a  -  R'1  r)T  R  (a  -  R'1  r)  +  R  -  rT  R*1  r  (9) 

Equation  (9)  may  be  verified  simply  by  multiplying  out  the  quadratic  form  and 
cancelling  the  appropriate  terms.  The  desired  minimization  can  now  be 
performed  by  noting  that  since  R  is  positive  definite  the  minimum  value  of  the 
quadratic  form  in  equation  (9)  is  zero  and  can  be  achieved  by  setting  a  equal 
to  a^  where 

a(p)  =  R1 1  (10) 

The  resulting  minimum  e  is  given  by 


0  .  HC<P> 

in  in 


Ro-rr  R_1x 

al  *  ET  a<P> 
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°  k=l  k  k 


(ID 


The  use  of  the  superscript  p  to  denote  the  minimizing  a^'s  and  emin  may 
seem  peculiar  but  the  reason  for  this  notation  will  become  apparent  in  the  next 
section. 

Equation  (10)  expresses  the  solution  to  a  set  of  linear  equations  in  matrix 
notation.  In  ordinary  notation,  the  equations  to  which  equation  (10)  is  the 
solution  are 


R.  - 


k=l 


a  ,  R.  .  =  0 
k  l-k 


(12) 


i  =  1 . p 

rhese  equations, called  the  autocorrelation  normal  equations,  will  play 
a  vital  role  in  the  sequel. 
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THE  LEVINSON  RECURSION 


The  autocorrelation  normal  equations  (12)  can  be  solved  in  a  recursive 

way  Vv  means  of  a  technique  known  as  Levinson’s  method*  To  derive  this 

technique,  first  assume  that  the  solution  to  the  n  order  autocorrelation 

normal  equations  is  known  and  denote  it  by  a^  ,  k  =  1, . .  .n.  Next,  write 
st 

down  the  n+1  order  equations  in  the  form 

R,  -  2  a^1*  R  -  a^^R.  =  0 

k  i-k  n+1  i-n-1 
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k=l 


R 


n  i-1 


i  =  I , . . .n. 

I  4n+0  R_,  i  v  -  a(n+) )  R  =  0 
k  n+l-k  n+1  o 


(13) 


A  neat  way  of  getting  at  the  Levinson  recursion  is  to  assume  a  solution  to  (13) 
of  the  form 


,  (n+1)  _ 


a(n)  -  b 
ak  bk 


,  k  =  l,.,.n. 


(14) 


witha^1^  to  be  determined  later.  Substitution  of  (14)  into  the  first 
n  of  equations  (l3)  leads  to  the  new  equation 


n 

2  bk  Ri-k 
k=l  K  1  K 


a^1*  R  =  0 

n+1  i-n-1 


(15) 


i  =  1 .... n. 

lL 

Motivated  by  the  fact  that  equations  (15)  look  very  much  like  the  n  "  order 
autocorrelation  normal  equations,  the  change  of  variable  j  =n+l-i  is  made  with 
the  result 

Jn+l) 


n 

a'  "  -'  R.  -  2  b.R.  , 

n+1  j  k  i  k  j+k-n-1 


0 


(16) 


j  1, . ..n. 

Next,  the  change  of  variable  (  =  n  +  1  -  k  is  made  and  (16)  becomes 


ai^t^  R.  -  2  b  „ 

n+1  j  |:rl  n+1  -f  j-f 


R, 


=  0 


(17) 


j  1, . . .n. 

Since,  equations  (17)  are  a  scaled  version  of  the  ntb  order  autocorrelation  normal 
equations  their  solution  is  evidently  given  by, 


See  reference  7. 
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b  =  a(n+1>  a(n> 

ii+l  -f  n+1  af 


^  1  ,  •  •  •  n  • 


(IS) 


and,  therefore 


,(n)  _  (n+1 )  (n) 

k  Vl  an+l-k 


It  only  remains  to  see  if  a  value  of  aj^1*  can  be  found  such  that 
the  last  remaining  equation  in  the  set  (13)  can  be  satisfied.  Using  (19), 
this  equation  now  reads, 


(19) 


R 


n+1 


2  a(")  -  a(n+1)  a(n) 

k-i  k  n+1  n+1 


-]■ 


-  a(n+1)  R  -  n 
n+1  -k  n+1  o  0 


This  equation  can  be  solved  for  with  the  result, 


(20) 


a  K  = 

n+1  n 


R  -  2  a(n)R 
n+1  k  n+l-k 


R 


2  a(n)R 
k=l  k  k 


(21) 


This  result  is  meaningful  as  long  as  the  denominator  is  not  zero;  however, 

the  denominator  is  exactly  equal  to  the  minimum  mean  squared  error  for  the 

n  stage  of  the  process,  e^  as  given  by  equation  (11).  However,  e^  can  never 

be  zero,  for  if  it  were,  it  would  follow  that  sn  =  "  a,  s  .  for  all  n  Since  s 

n  fcii  K  n-K  n 

for  m-0,  this  equation  implies  that  s  =  0  for  all  n.  Since  this  case  never  arises 
in  practice,  it  follows  that  equation  (21)  is  always  meaningful. 


=0 


The  only  ingredient  missing  to  set  this  recursive  process  in  motion  is 

a  solution  to  the  first  order  -ystem  and  this  can  be  written  down  by  inspection 
of  (12)  as 

(1)  Rj 

al  S  Ko  =  TT 

°  (22) 

For  later  considerations,  it  will  be  useful  to  rewrite  the  Levinson  recursion 

in  terms  of  the  inverse  filter  transfer  function  Hn(z)  instead  of  in  terms  of  the 

coefficients  aj_n>  as  given  by  equations  (19)  and  (21).  This  recursion  is  easily 


5 


seen  to  be  given  by, 


'Wz>  "„‘z>  ■  V'<n+1) 


(23) 


with  K  lieing  deterinined  l—  the  R’s  via  equation  (21).  The  initial  condition 
n  k 

lor  (23)  is  given  ty 

^1  - 1 

",  (')  1  -  -qr  y-  <24) 

o 

It  is  evident  from  equation  (22)  that  j  .\q|  <•  1  and  it  turns  out  that  this 
Is  true  for  K  for  all  n.  Since  this  fact  will  be  vital  in  the  sequel  it  will  be  proved 
now. 

Vo  this  end.  it  will  be  necessary  to  rewrite  equation  (2 1 ) in  the  /-transform 
domain  by  making  use  of  the  easily  verified  identity. 


11. 


2  s  s  , 
n  n  n  -  k 


£ 


w2"k,|  s<Tmf,  |  df 


(2i) 


where  S(z)  denotes  the  /-transform  of  the  speech  samples 

-n 


S(z) 


n 


n 


In  order  to  simplify  notation,  equation  (25)  will  be  rewritten  as 

-k 


Rk  J  z‘k  |  S<*>  | 


df 


(26) 


(27) 


where  the  convention  in  orce  here  and  in  the  sequel  is  that  all  integrals  have 
limits  (-?,  £)  and  whenever  the  variable  z  appears  under  an  integral  sign,  it  is 
understood  to  be  equal  to  e^2  Equation  (21)  which  defines  Kn  now  can  be 
rewritten  in  the  form 


K 


n 


/  Uw| 


z-(n+0  J  a(n^-(n+l-k) 
k»i  k 

:  n  (n)  -k  \ 

1  -  2  a.  z  df 
i  k=i  k 


df 
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(28) 


/|s(z)j 


z  '(n+1)  H  (z-1)  df 
n 


/ 


S(zj|  Hn(z)df 

Since  the  denominator  of  this  equation  is  the  minimum  mean  squared  error,  it 
follows  that, 

(r,\  f  .  2 

(z)  df  (29) 


=<n)  •  j |S<*>  |  5  Hn 


/n\ 

A  recursion  for  el  ;  can  easily  be  derived  by  writing 

>+l> 


/I 
/ 1 


Hn+i  (z)  df 


S(z) 

S(z)  I  P  [  Hn  (z)  -  Kn  z' 


(n+1)  H  (z'1).!  df 
n  1 


e<">  -  K 


Cn  f  S(z)  ?  z'(n+1)H„  (z_1)  df 


(n) 
ev  ' 


2  1 

-  K 

n  J 


(30) 


where  the  last  step  follows  from  equation  (28). 

Since  e^  must  always  be  positive,  it  follows  from  the  last  equation  that 

K  1  as  advertised, 

n 

As  an  important  application  of  the  result  that  J  Kn  |  <r  1 ,  it  will  be  shown 
that  all  the  zeros  of  Hn(z)  lie  strictly  inside  the  unit  circle,  which  implies 
that  the  speech  synthesis  filters  [Hn(z)  j  will  always  be  stable.  The 
proof  proceeds  by  induction  by  first  noting  that  because  a  correlation  function 
is  always  maximum  at  the  origin,  |  |  <  Rq,  it  follows  that  Hj(z)  as 

defined  by  equation  (24),  has  its  zero  inside  the  unit  circle.  Next,  assume  that 
Hn(z)  has  its  n  zeros  inside  the  unit  circle.  Multiplying  equation  (23)  by  ana 
noting  that,  on  the  unit  circle  j  zn+1Hn(z)  j  =  H^z  ^  | ,  it  follows  from  Rouche’s 


theorem  that  zn  +  *H  .  ,(z)  and  zn+  * 
n+ 1 


Hn(z)  have  equal  numbers  of  zeros 


n+1. 


inside  the  unit  circle.  Since  z  Hn(z)  has  n+1  zeros  inside  the  unit  circle 
the  proof  of  the  statement  follows  by  induction. 


Reference  10,  p.  116. 
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The  Nonuniform  Acoustic  Tube 


Figure  1  depicts  three  sections  of  a  nonuniform  acoustic  tube  The 
cross-sectional  area  of  the  n  section  is  and  the  length  of  all  sections 
is  A.  The  forward  and  backward  components  of  the  volume  velocity  measured 
at  the  left-hand  end  of  the  n  section  are  sampled  every  2A/c  seconds  and  the 
7-transforms  of  these  samples  are  denoted  by  V+  (z)  and  (z).  The  constant 
c  denotes  the  velocity  of  sound  in  the  tube. 

^ j ^ 

The  relationship  between  the  volume  velocities  in  the  n  and  n*-l 
sections  can  be  determined  by  writing  down  the  continuity  equations  fc  *  volume 
velocity  and  acoustic  pressure  at  the  boundary  between  the  n  and  n+1  '  sections. 

The  z-transforms  of  the  forward  and  backward  volume  velocities  measured 
at  the  right-hand  end  of  the  nC^  section  are  given  by  z  (z)  and  zc  Vn  (z) 
respectively.  The  continuity  of  volume  velocity  can  now  be  expressed  by  the  equation 


V„tl  <z)  ‘  Vn+l(Z)  =  2‘,'C(Z)  ■  Z*Vn(Z)  (31) 

Since  the  acoustic  impedence  of  the  n  section  is  given  by  pc  /An  where  p 
denotes  the  density  of  air,  the  continuity  of  acoustic  pressure  is  expressed 
by  the  equation, 


pc 


A 


n+1 


v>i  *  Vi(!)]  ■  |  v;  <*>  +  Z'vn  <*> 


(32) 


These  equations  can  be  solved  for  (z)  and  (z)  with  the  result. 


r+ 
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z  V  (z)  -  r  z  Vt 

L  n  n  i 

n+1 

1+r 

n 

r~ 

(z)  - 

1 

.  i  i  i 

-r  z  V  (z)  +  z  V 
In  n  n 

n+1 

1+r 

n 

(33) 


where  the  reflection  coefficient  rp  is  defined  by 


r 

n 


A  -  A  , 
n  n+1 

"A  +  A  , 
n  n+1 


(34) 


This  section  is  based  primal ily  on  reference  1.  Note  carefully  that  the  numbering 
of  the  tube  sections  differs  ft  am  that  in  ref.  1  in  that  n  here  corresponds  to  Wakita’s 
N-n. 
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In  matrix  form  these  equations  read, 

-* 


<z> 


v^+i  <z) 


1+r 


-r  z  V  (z) 
n  r  n 


-r  z 
n 


Vn  <z) 


(35) 


]  quation  (35)  can  be  inverted  easily  with  the  result, 


v+n<2)  1 

z* 

1-r 

1  r 

r  n  1 

Vn+1  ,Z) 

V"  (z) 

n 

-  rz*1  z’1- 
n 

Vl  (z) 

(36) 


These  equations  can  be  conveniently  normalized  by  introducing  the 
quantities 


u>> 


u„ <z) ' 


*n;'  <>-v 


i=l 


(37) 


n-1 

n  (1  -r.) 


V 


l 


V 


in  terms  of  which  equation  (36)  becomes: 


r  (z) 


L  U  (z)  J 
n 


r  U+  (z) 

n  t  r  n+l 


-1  -1 
r  7.  z 
n 


un;i (z) 


(38) 


The  quantities  (z)  and  (z)  can  be  interpreted  as  the  forward  and  backward 

components  of  volume  velocity  in  a  fictitious  acoustic  tube  which  differs 

n-1 

from  the  real  tube  only  in  that  a  gain  factor  n  (1  -r.)  and  an  overall 

i=i 

fn 

delay  z  z  have  been  removed. 

Equation  (38)  can  be  used  to  derive  a  digital  network  whose  response  is  the  same  as  that 
of  the  acoustic  tube.  To  accomplish  this,  equation  (38)  is  first  rewritten  in  the 
form: 


10 


I 


4  (,)  -  Vn  <Z)  -  rn..n+;  (z> 


0'  (z)  =  z'1  rU  *  (z)  +  U  ‘ 
n  '  n  n+1  n+1 


The  digital  network  that  is  generated  by  equation  (39)  is  shown  in  Figure  2. 

This  network  as  drawn  is  incomplete  because  no  termination  has  been  specified 
thus  making  it  impossible  to  compute  the  sequence  of  backward  going  waves. 

As  an  example  of  a  termination  (one  that  will  play  a  role  in  the  sequel)  assume 
the  end  of  the  tube  is  connected  to  a  tube  of  infinite  cross  section  and  of  infinite 
length  i.c. ,  free  space  filled  with  air.  This  means  that  the  final  reflection  coefficient 
is  -1  and  that  there  is  no  backward  wave  at  the  output.  The  network  for  this 
arrangement  is  shown  in  Figure  .3  with  the  inputs  to  the  network  being  the  output 
of  an  N- section  acoustic  tube. 

The  next  order  of  business  is  to  compute  the  transfer  function  of  an 

N-scction  acoustic  tube.  This  will  be  done  for  the  tube  termination  depicted  in 

Figure  3  which  implies  that  Uou(.(z)  U+  (z).  Since  equation  (38)  enables 

one  to  recursively  compute  the  z -transforms  of  the  forward  and  backward  waves 

in  the  n1'1  section  of  the  tube  in  terms  of  their  counterparts  in  the  n+lSt  section  it  is 

natural  to  assume  a  simple  output  z-transform  and  then  compute  the  input  z-transform 

Uq  (z)  that  produced  this  output.  If  -  1  is  assumed,  then  it  follows  that 
+  -  - 1 

U^,  (z)  =  1  and  U^,(z)  -z  .  F.quation  (38)  is  now  employed  N  times  to  arrive 
at  Uq  (z)  and  it  follows  that  the  tube’s  transfer  function  is 


T(z) 


U  „  (z) 
out 

uj  <z) 


,+n<*> 


The  computation  just  described  is  related  to  the  Levinson  recursion  in 
a  very  important  way.  To  make  this  fact  clear,  the  Levinson  recursion  must 
be  rewritten  by  introducing  the  functions  G*  (z)  and  Gn  (z)  defined  by 


Gn  <*> 


G"  (z) 
n '  ' 


Hn  <Z) 


-z-(n+1)H  (z/1) 


1 


-Ill 


(39) 


U 


4  <Z)  “  lJn  <*>  '  rnl,n+i  <7> 


n+ 1 


U>>  -  z'V"U"+>fe)  +  hrH  W 

The  digital  network  that  is  generated  by  equation  (39)  is  shown  in  Figure  2. 

This  network  as  drawn  is  incomplete  because  no  termination  has  been  specified 
thus  making  it  impossible  to  compute  the  sequence  of  backward  going  waves. 

As  an  example  of  a  termination  (one  that  will  play  a  role  in  the  sequel)  assume 
the  end  of  the  tube  is  connected  to  a  tube  of  infinite  cross  section  and  of  infinite 
length  i.e. ,  free  space  filled  with  air.  This  means  that  the  final  reflection  coefficient 
is  - 1  and  that  there  is  no  backward  wave  at  the  output.  The  network  for  this 
arrangement  is  shown  in  Figure  3  with  the  inputs  to  fhe  network  being  the  output 
of  an  N-section  acoustic  tube. 

The  next  order  of  business  is  to  compute  the  transfer  function  of  an 
N-section  acoustic  tube.  This  will  be  done  for  the  tube  termination  depicted  in 
Figure  3  which  implies  that  UQUt(z)  IJ  (z).  Since  equation  (38)  enables 
one  to  recursively  compute  the  z-transforms  of  the  forward  and  hackward  waves 
in  the  n1*1  section  of  the  tube  in  terms  of  their  counterparts  in  the  n+lst  section  it  is 
natural  to  assume  a  simple  output  z-transform  and  then  compute  the  input  z-transform 
Uq  (z)  that  produced  this  output.  If  =  1  is  assumed,  then  it  follows  that 

U+(z)  -  1  and  U  (z)  -z  .  Equation  (38)  is  now  employed  N  times  to  arrive 

N  N 


at  Uq  (z)  and  it  follows  that  the  tube's  transfer  function  is 


T(z) 


Uout  <Z) 


Uq  (Z) 


Uq(z) 


(40) 


The  computation  just  described  is  related  to  the  Levinson  recursion  in 
a  very  important  way.  To  make  this  fact  clear,  the  Levinson  recursion  must 
be  rewritten  by  introducing  the  functions  G+  (z)  and  Gn  (z)  defined  by 


G+  (z) 
n 


G"  (z) 
n'  ' 


Hn  (z) 


-z'(n+1)H  (z'1) 


1  1 


(41) 


In  terms  of  tnese  functions,  the  Le'dnson  recursion,  equation  (23),  can  be  written 
as  a  set  of  two  recursions  as  follows: 


GV,<Z>  -  G>>  +  K„  G„<z> 

Vl  <z>  *  Kn  Gn<z>  +  Gn  <Z) 

or,  in  matrix  form, 


(42) 


■  Gn+1  <z)  ]  r  1  Kn  ' 

.  c;,+i  <z>  J  [  K/'  z''  . 

The  initial  condition  for  the  recursion  is  now 


G+(z)  1 
n 

Gn  <z>  . 


(43) 


Go(z>  =  1 

Gq  (z)  =  -z 


(44)  . 


A  comparison  of  equations  (42)  and  (38)  reveals  that  these  two  recursions 
are  identical  in  form  except  that  the  indexing  of  the  two  are  reversed,  i.e. , 
the  acoustic  tube  indexing  is  from  n  -  N- 1  to  n  =  0  but  the  Levinson  recursion 
indexes  from  n  =  0  to  n  =  N-l.  Moreover,  comparison  of  equation  (44)  and  the 
initial  conditions  used  for  computing  the  acoustic  tube's  transfer  function  shows 
that  these  are  also  identical.  What  this  all  means  is  that  an  acoustic  tube 
with  reflection  coefficients  given  by  r^,_n  =  Kn  _j  has  a  transfer  function  given  by 

T(z)  f  Hn  <z)|  (45) 


In  other  words,  since  the  Levinson  recursion  yields  the  best  estimate  of  the  filter 
inverse  to  the  filter  that  produced  the  original  speech  samples,  the  acoustic  tube 
filter  discussed  above  has  a  transfer  function  that  is  an  estimate  of  the  filter 
that  originally  produced  the  speech.  Thus,  this  acoustic  tube  filter  is  a  natural 
candidate  for  a  filter  to  synthesize  speech. 


Aial  (reference  8)  has  given  a  different  derivation  of  the  transfer  function  of 
a  nonuniform  acoustic  tube.  His  derivation  leads  to  the  transfer  function  given 
bv  equation  (43)  however,  his  acoustic  tube  differs  from  the  one  derived  above 
mainly  in  that  the  input  and  output  terminals  are  interchanged.  In  other  words, 
the  reflection  coefficient  K0  which  appears  at  the  output  end  of  the  acoustic  tube 
derived  above,  appears  at  the  input  end  of  Atal  s  acoustic  tube.  Mathematically 
there  does  not  seem  to  be  any  reason  to  choose  one  of  these  acoustic  rubes  over 
the  other  since  they  have  identical  transfer  functions,  however  "'akita’s 
tube  seems  more  natural  as  a  model  of  the  vocal  tract.  .  ;tis  follows  from 
the  fact  that  Wakita’s  output  termination  is  an  infinite  cross- sect  ion  tube 
which  appears  correct  for  modelling  the  interface  he  ween  the  lips  and  the 
outside  world. 


It  has  now  been  demonstrated  how  speech  data  can  be  used  to  derive 

a  set  of  filter  coefficients  a.  ^  and  a  set  of  reflection  coefficients  K  . 

k  n 

The  former  could  be  used  in  a  direct -form  realization  of  a  speech  synthesis 
filter  whereas  the  latter  could  be  used  to  synthesize  an  acoustic  tube  synthesis 
filter.  Which  of  these  realizations  is  better  is  still  a  topic  for  investigation. 
For  the  sake  of  completeness,  this  section  will  conclude  by  showing  how 

l-i 


"„<z> 


.  can  be  realized  as  an 


an  arbitrary,  stable  all-pole  filter 
acoustic  rube. 

The  basic  tool  for  this  demonstration  is  the  so-called  backward  Levinson 


recursion  which  can  be  derived  from  the  forward  Levinson  recursion,  equation  (23) 
as  follows.  Solving  equation  (23)  for  Hn(z)  yields  the  relation, 


(46) 


Hn<z>  '  Hn+1  <z>  +  Kn  Z'<W1)  H„(z'‘> 

Next  set  z  =  z’1  in  equation  (23)  and  solve  again  for  Kn  Hn(z)  with  the 


result: 


K  H  (z)  =  z 
n  n 


-(nfl) 


»„<*'’>  ’  Hr 


The  elimination  of  H  (-’  )  between  equations  (46)  and  (47)  leads  to  the 
n 


desired  result: 


H„<z)  - 


1  -  K 


Hn+.<z>+  Knz-<IM-,)Hn+i  <z  ') 


Since  the  constant  term  in  H^(z)  is  unity,  it  follows  from  equation  (48) 


K  =  -  z 
n 


H  (z)  |  z=0 


th  “1 

Let  H  (z)  denote  an  arbitrary  N  order  polynomial  in  z  with  constant 

N 

term  equal  to  unity.  Furthermore,  assume  that  all  the  zeros  of  H^j(z)  lie  strictly 
inside  the  unit  circle  so  that  H^j(z)  _1  is  the  transfer  function  of  a  stable,  all¬ 
pole  filter.  Since  all  the  zeros  of  H^j(z)  are  inside  the  unit  circle  and  since  the 
coefficient  of  z'N  in  H  (z)  is  the  product  of  all  the  zeros  of  HJz),  it  follows  that 

K  as  given  by  equation  (49)  satisfies  K  <  1 . 

N  1  N' 


Assume  next,  that  the  backward  Levinson  recursion,  equation  (48), 
has  been  implemented  n  times  and  that  |  |  <•  1  and  that  the  polynomial 

H.,  (z)  has  a  constant  term  equal  to  unity  and  that  all  its  zeros  lie  inside  the 

unit  circle.  It  now  follows  from  an  application  of  Rouche’s  theorem  that 
H^,  n  (z)  as  given  by  equation  (48)  has  all  of  its  zeros  inside  the  unit  circle 
and,  therefore,  that  J  K^_n  |  <  1  .  The  details  of  this  argument  will  not  be 
given  here  because  they  are  virtually  identical  to  those  given  earlier  when  it 
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was  shown  that  the  forward  Levinson  recursion  leads  to  stable  filters  as  long 
as  the  Kn's  used  satisfy  j  j  <  1.  It  now  follows  by  induction  that  all 
the  Kn 's  produced  by  the  backward  Levinson  recursion  equations  (48)  and  (49) 
satisfy  jKn  |  ^  1  as  long  as  the  starting  polynomial  H^T  (/)  had  all  of  its  zeros 
inside  the  unit  circle. 


Since  it  is  obvious  that  a  forward  Levinson  recursion  using  the  Kn's  derived 
from  a  backward  Levinson  recursion  will  yield  back  the  starting  polynomial 
H.,(z),  it  follows  from  the  discussion  earlier  in  this  section  that  a  properly 
terminated  acoustic  tube  having  these  kn’s  as  reflection  coefficients  will  have 
a  transfer  function  given  by  H. ,  (z)  ^  .  It  has  thus  been  shown  how  an 

arbitrary,  stable  all-pole  filter  can  be  realized  as  an  acoustic  tube. 

The  Orthogonal  Polynomial  Approach 


The  theory  thar  has  been  presented  is  complete  in  itself,  however,  it 
should  be  pointed  out  that  the  results  that  have  been  derived  are  often  arrived  at 
in  the  literature  by  a  completely  different  path  making  use  of  the  theory  of  polynomial 
orthogonal  on  the  unit  circle^.  The  details  of  this  alternate  approach  will  now 
be  presented.  The  first  part  of  this  section  will  deal  exclusively  with  the  theory 
of  these  polynomials  with  the  connection  to  the  speech  problem  being  made  later. 


This  section  is  based  on  references  3,4  and  8. 


A  weighting  function  w(z)  is  defined  to  be  any  function  that  satisfies 
w(z)  0  on  the  unit  circle  and  in  addition,  satisfies 

J  w(z)  df  >  0  ( 

A  finite  or  infinite  set  of  polynomials, 


(z)  -  I  a  z' 
m  k=0  nk 


n  =  0,  1, .. . 


is  said  to  be  orthogonal  with  respect  to  the  weighting  function  w(z)  on  the  unit 
circle  if 

a)  a  >0  ,  n  =  0,  1 , . . . 

nn 

_  (52) 

b)  Jf „<*>  V>m(z)  w(z)df  '  *nm 

In  equation  (52),  the  overbar  denotes  complex  conjugation  and  6  nm  the  Kroneker 
delta. 

It  will  now  be  shown  that,  given  any  weighting  function,  there  exists  a  set 
of  polynomials  satisfying  conditions  a)  and  b).  The  proof  will  proceed  by  induction 


by  defining, 


•p  (z)  =  c  ^ 

^  o  o 


where 


cq  =  ^w(z)  dz 


The  set  of  polynomials  consisting  of  <PQ(z)  alone  obviously  satisfies  a)  and  b). 

Assume  now  that  a  set  of  N  polynomials  satisfying  a)  and  b)  has  been 
constructed  and  enlarge  this  set  by  one  by  defining 


I  a,  (z) 
k=0  K 


O  N(z)  =  A  z 


where  A  and  the  a,  's  are  to  be  determined, 
k 
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It  follows  that 


/ 

[F°‘ 


(5w(z)t»,  (z)  w(z)  df 


t(z)  w(z )  df  -  a£ 


] 


£  =0,...  N-l  (56) 

It  is  n  .w  obvious  from  equation  ( 56)  that  condition  b)  will  be  satisfied  by 
defining 

a . 


'f 
A  = 


N 


(z)  w(z)  df 
N-l 

2  av°t<z) 
k=0  K  K 


w(z)  df 


(57) 


The  last  equation  is  meaningful  only  if  the  integral  appearing  in  it  doesn't 

vanish  which  is  always  the  case  because  it  is  well  known  that  the  powers  of  z  form 

a  linearly  independent  set.  Finally,  if  the  positive  square  root  is  always  taken 

in  equation  (57),  it  follows  that  condition  a)  is  also  satisfied  by  the  enlarged 

set  of  polynomials.  The  proof  of  existence  is  complete. 

Next  it  will  be  shown  that  a  set  of  polynomials  satisfying  a)  and  b) 

is  unique.  Assume  the  contrary.  Then  there  exist  two  different  sets  of  polynomials 

<fn( z)  andy>'n  (z)  both  satisfying  a)  and  b).  Next,  note  that  it  follows  from 

condition  b)  that  zn  can  be  written  as  a  linear  combination  of  <P  (z),  Q  (z). . . 

n  n-l 

< PQ  (z).  (This  is  obvious  for  n  --  0  and  follows  by  a  simple  induction  for  the  other 
powers  of  z.)  This  fact  in  turn  implies  that 


/ 


C>n(z)  z  w(z)  df  =  0 

k  =  0,  1,...  n-l  (58) 

Now,  because  there  are  two  sets  of  polynomials  satisfying  a)  and  b),  it  follows 
that  the  polynomial 

k 

p(z)  =  yn(z)  -  J1  <Pn  (z)  =  0  (59) 

n 

where  kR  and  kn  denote  the  coefficient  of  zn  in  (z)  and  (z),respectivelv,  is  of 
degree  no  higher  than  n  - 1 . 
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From  this  fact  and  equation  (58),  it  follows  that 

2 


/I' 


p  (z)  w(z)  df 


=  J  On(z)  -  <Pn  (z)  p(z)  w(z)  df 
k 

=  0  n 

and,  therefore,  that  p(z)  =  0  which  implies  that 


(60) 


O  (z)  =  n  *  (z) 
n  —  n 

k* 


(61) 


However,  k  =  k  because, 
n  n 


1  =  J  |  t?n  (z)  2  w(z) 


df 


(62) 


=  J jo'n  (z)  |  2  w(z)  df 

and  the  uniqueness  of  anv  set  of  polynomials  satisfying  a)  and  b)  has  been 
established. 

It  is  now  possible  to  establish  a  number  of  important  properties  of  orthogonal 
polynomials.  The  first  of  these  is  the  fact  that  all  the  zeros  of  a  set  of  polynomials 
satisfying  a)  and  b)  lie  inside  the  unit  circle.  To  prove  this  fact,  let  zR  be  a  zero 
of  o  (z);  (z0)  -  0.  The  polynomial  (z)  /(z  -  ZpHs  then  of  degree 

n-1  and  it  follows  from  equation  (58)  that 


fe  n(z) 


°n<*> 


z  -  z 


0 


w(z)  df 


(63) 


Equation  ;63)  can  easily  be  rewritten  in  the  form, 

2 


/ 


(z  *  zQ) 


On(z) 


z  z. 


w(z)  df  =  0 


(64) 


from  which  it  follows  that 

v-  J 

1 

rz 

wn<z> 

2 

w(z)  df 

z  -z0 

-  f 

°n(z) 

T~ 

w(z)  df 

z  -z0 

(65) 
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Since  z  =  z-1,  a  simple  application  of  the  Schwartz  inequality  to  equation  (65)  now 
shows  that  |  zq  |  <  1  where  the  strong  inequality  follows  from  the  fact  that 
z  is  not  proportional  to  unity  on  the  unit  circle.  This  proves  the  theorem. 

The  next  fact  to  be  established  provides  the  link  between  the  theory 
of  orthogonal  polynomials  and  the  speech  problem  introduced  earlier.  The  property 
of  orthogonal  polynomials  that  accomplishes  this  is  embodied  in  the  statement 
that  «n(z)  minimizes  the  integral 


f  |pn(z)  |  w(z)  df 


(66) 


where  the  minimum  is  taken  over  all  polynomials  of  the  form  pn(z)  -  z  +  an-iz 

The  minimum  itself  is  k  2  where  k  denotes  the  coefficient  of  z11  in  V  (z). 

n  n  n 

The  proof  of  this  statement  can  be  established  by  first  noting  that 

since  z11  can  be  written  as  a  linear  combination  of  ^  (z)  ,  •  •  .uq  ( :)> 

it  follows  that  any  pn(z)  can  be  represented  as 

n 


p  (z)  =  2  vkVz) 

n  k~0  K  K 


(67) 


+  . . 


.  a 


O' 


where  V  =  k  1  in  order  to  force  the  coefficient  of  zn  in  p  (z)  to  be  unity, 
n  n  n 

Substitution  of  equation  (67)  in  equation  (66)  yields 


n 

2 

k=0 


2 


*|v„f=  k„2  <68> 

However,  the  lower  bound  given  in  equation  (68)  can  be  achieved  by  setting  -  0, 
k  =  0, . .  .n-1  and  the  proof  of  the  minimization  property  of  orthogonal  polynomials 


follows. 

The  connection  to  the  speech  problem  now  follows  by  recalling  that  this  problem 
boiled  down  to  minimizing  the  mean-squared  error  given  by  equation  (4).  Using  Parseval’s 
theorem,  this  equation  can  be  rewritten  in  the  z-transform  domain  with  the  result, 
e  =  J  |  Hp(z  j2  |  S(z  j2  df  (69) 


21 


where 


P  -if 

H  (z)  =  1  -  I  a.  z  K  (70) 

p  k=l  K 

Since  |  zP|  1  on  the  unit  circle, minimizing  the  integral  in  equation  (69) 
is  the  same  as  minimizing  the  integral  given  by 


/' 


zp  H  (z) 


S(z) 


df 


(71) 


But  zPu^(z)  is  a  pth  order  polynomial  with  lead  coefficient  univy  and  it  follows  from 

the  above  minimization  property  of  orthogonal  polynomials  that  the  minimum  of 
-2 

(70)  is  given  by  and  is  achieved  when 


ZP 

Here,  (z)  denotes  the 
P 

function  given  by 


Hp(z)  =  kp1  <Pp(z)  (72) 

t  h 

p  orthogonal  polynomial  with  respect  to  the  weighting 


w(z)  =  |  S(z) |  2  (73) 

The  above  argument  has  transformed  the  speech  problem  under  consideration 
from  one  of  minimizing  a  certain  integral  to  one  of  finding  the  pth  order  orthogonal 
polynomial  with  respect  to  the  weighting  function  J  S(z)  j  2  .  There  exist  explicit 
expressions  for  the  polynomials  orthogonal  with  respect  to  an  arbitrary  weighting  function, 
however,  their  evaluation  requires  the  computation  of  large  determinants.  A 
computationally  more  attractive  approach  to  the  evaluation  of  the  coefficients  of 
,-,'p(z)  is  available,  however,  because  of  the  existence  of  a  recursion  formula 
for  the  orthogonal  polynomials.  The  existenceof  such  a  recursion  formula  should 
come  as  no  surprise;  in  fact,  from  the  discussion  in  the  previous  section,  it  should 
be  obvious  that  the  desired  recursion  must  be  equivalent  to  the  Levinson  recursion. 

To  derive  this  new  version  of  the  recursion,  substitute  equation  (72)  into  the  Levinson 
recursion,  equation  (23)  with  the  result 


k  "!  °  (z)  =  k'1  z  <P  (z) 

n+1  n+1  v  '  n  n  ' 


K„k„'  (74) 
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Next  the  fact  that  kn“  is  the  mean-squared  error  at  the  nth  stage  coupled  with 
equation  (30)  yields  the  final  recursion  formula 

2  * 

°„+l<z>  -  |Z1VZ)  -  Knz"‘p„  <r’> 

The  Kn’s  appearing  in  equation  (73)  are  still  given  by  equation  (21)  where 
now 

Rn  =  JZ  n  w(z*  df  *  (76) 

Conclusion 

The  basic  mathematics  relating  to  the  linear  predictive  filtering  approach 
to  speech  analysis/synthesis  has  now  been  presented.  The  analysis  began  by 
postulating  that  speech  is  produced  by  exciting  an  all-pole  filter  with  either  a  uniform 
impulse  train  or  white  noise.  A  minimum  mean-squared  error  technique  for 
estimating  the  parameters  of  an  all -pulse  filter  from  a  segment  of  speech  data 
was  then  introduced  and  an  explicit  expression  for  this  filter  in  terms  of  the 
speech  data  was  derived. 

Next,  a  numerically  attractive  recursive  technique  for  computing  this  filter 
was  derived  and  it  was  shown  that  this  filter  must  always  be  stable.  This  filter 
can  be  realized  in  a  variety  of  ways  such  as  direct  form,  cascade  form,  and  in  addition, 
it  was  demonstrated  that  it  also  can  be  realized  as  a  non-uniform  acoustic  tube. 

The  reflection  coefficients  defining  this  tube  are  generated  as  a  matter  of  course 
when  computing  the  filter  by  means  of  the  recursive  technique  just  mentioned. 


23 


REFERENCES 


1 .  H.  Wakita,  "Estimation  of  the  Vocal  Track  Shape  by  Optimal 
Inverse  Filtering  and  Acoustic /Articulatory  Conversion  Methods," 

Speech  Communications  Research  Laboratory,  Monograph  No.  9, 

Santa  Barbara,  California. 

2.  J.I.  Makhoul,  Jared  J.  Wolf,  "Linear  Prediction  and  the  Spectral 
Analysis  of  Speech,"  Bolk,  Beranek  and  Newman  Report  No.  2304, 

Cambridge,  Mass. 

3.  L.Ya.  Geronimus,  Orthogonal  Polynomials,  (Consultants  Bureau,  New  York, 
1961). 

4.  G.  Szego,  "Orthogonal  Polynomials,"  American  Mathematical  Society 
Colloquium  publication,  Vol.  XXIII,  New  York  (1959). 

5.  (J.  Grenander,  G.  Szego,  Toeplitz  Forms  and  Their  Applications  (University 
of  California  Press,  Berkeley  and  Los  Angeles,  1958). 

6.  J.D.  Markel,  A.H.  Gray,  Jr.,  "Autocorrelation  Equations  as  Applied 
to  Speech  Analysis, "  IEEE  Trans.  Audio  Electroacoust.  AU-21  (1973). 

7.  N.  Wiener,  Extrapolation,  Interpolation  and  Smoothing  of  Stationary 
Time  Series,  (The Technology  Press  and  J.  Wiley  and  Sons,  New  York, 

1957)  .Appendix  B. 

8.  B.S.  Atal,  S.L.  Hanauer,  "Speech  Analysis  by  Linear  Prediction  of  the 
Speech  Wave,  ”  J.  Acoust.  Soc.  Am.  50  (1971). 

9.  F.  Itakura,  S.  Saito,  "Digital  Filtering  Techniques  for  Speech  Analysis  and 
Synthesis,"  Seventh  International  Congress  on  Acoustics,  Budapest  (1971). 

0.  F.C.  Titchmarch,  The  Theory  of  Functions,  Second  Edition  (Oxford 
University  Press,  1939). 


24 


